Computer file processing

ABSTRACT

A method and apparatus for processing of computer files. An embodiment of a method for processing computer files includes receiving a serial data stream input, where the serial data stream input represents a set of computer files. The method further includes scanning the serial data stream input to extract selected data elements occurring in the set of computer files, and outputting the selected data elements in a serial data stream output.

RELATED APPLICATIONS

This application is related to and claims priority to U.S. provisional patent application 60/953,933, filed Aug. 3, 2007.

This application is further related to:

-   -   U.S. patent application Ser. No. 11/648,065, entitled “Computer         File System Traversal”, filed Dec. 30, 2006;     -   U.S. patent application Ser. No. ______, entitled “Computer         Computer Archive Traversal”, attorney docket 6570P472, filed         Aug. 1, 2008, claiming priority to U.S. provisional application         60/953,932, filed Aug. 3, 2007;     -   U.S. patent application Ser. No. ______, entitled “Annotation         Processing of Computer Files”, attorney docket 6570P474, filed         Aug. 1, 2008, claiming priority to U.S. provisional application         60/953,935, filed Aug. 3, 2007;     -   U.S. patent application Ser. No. ______, entitled “Annotation         Data Filtering of Computer Files”, attorney docket 6570P475,         filed Aug. 1, 2008, claiming priority to U.S. provisional         application 60/953,937, filed Aug. 3, 2007;     -   U.S. patent application Ser. No. ______, entitled “Annotation         Data Handlers for Data Stream Processing”, attorney docket         6570P476, filed Aug. 1, 2008, claiming priority to U.S.         provisional application 60/953,938, filed Aug. 3, 2007;     -   U.S. patent application Ser. No. ______, entitled “Dependency         Processing of Computer Files”, attorney docket 6570P492, filed         Aug. 1, 2008, claiming priority to U.S. provisional application         60/953,963, filed Aug. 3, 2007; and     -   U.S. patent application Ser. No. ______, entitled “Data         Listeners for Type Dependency Processing”, attorney docket         6570P493, filed Aug. 1, 2008, claiming priority to U.S.         provisional application 60/953,964, filed Aug. 3, 2007.

TECHNICAL FIELD

Embodiments of the invention generally relate to the field of computer systems and, more particularly, to a method and apparatus for computer file processing.

BACKGROUND

In computer operations, computer files, including Java class files, may contain various different types of elements. For example, there may be many different variations in data classifications within the files, such as instances of particular classes. Further, there may be elements such as annotations or other elements that contain additional data or metadata.

In certain circumstances, it may be necessary to identify which data elements are contained within the computer files. Because the data elements generally would not be indexed or otherwise identified in the files, it may be necessary to process the class files to search for the desired elements.

However, the structure of the files may make processing difficult. In one example, a set of class files may be in the form of a file hierarchy, which may not be easily searchable in an efficient manner. In another example, the files may be contained in an archive, which requires additional effort in the need to expand files to obtain the archived data. As a result, the identification of elements within the computer files may require significant processing time.

SUMMARY OF THE INVENTION

A method and apparatus are provided for computer file processing.

In a first aspect of the invention, an embodiment of a method includes receiving a serial data stream input, where the serial data stream input represents a set of computer files. The serial data stream input is scanned to extract selected data elements occurring in the set of computer files, and the selected data elements are output in a serial data stream output.

In a second aspect of the invention, a embodiment of a system includes a data input, where the data input is to receive an serial data stream input, the serial data stream input representing a set of computer files. The system further includes a processing module, where the processing module is to scan the serial data stream input to identify one or more elements in the set of computer files. The system also includes a data output, with the data output providing an extracted serial data stream output representing the identified elements of input data stream.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

FIG. 1 is an illustration of an embodiment of a file traversal process to produce a serial data stream;

FIG. 2 a is an illustration of a file hierarchy including an archive that may be processed in an embodiment of the invention;

FIG. 2 b is an illustration of an archive that may be processed in an embodiment of the invention;

FIG. 3 is an illustration of an embodiment of processing of class file data;

FIG. 4 is an illustration of a class file data processing system;

FIG. 5 is a flowchart to illustrate an embodiment of class file stream processing;

FIG. 6 is an embodiment of library utilities;

FIG. 7 is an illustration of a computer system in an embodiment of the invention; and

FIG. 8 illustrates an embodiment of a client-server network system.

DETAILED DESCRIPTION

Embodiments of the invention are generally directed to computer file processing.

As used herein:

“Computer file” means any file structure used in a computer system. Computer files include files with specific required structures, including Java class files.

“Class file” means a Java class file. A Java class file is a defined format for compiled Java code, which may then be loaded and executed by any Java virtual machine. The format and structure for a Java class file is provided in JSR (Java Specification request)-000202, Java Class File Specification Update (Oct. 2, 2006) and subsequent specifications.

“Traversal” means a process for progressing through the elements of a computer system, including a process for progressing through the elements of a computer archives

“Archive” means a single file that may contain one or more separate files. An archive may also be empty. The files within an archive are extracted, or separated, from the archive for use by a computer program. The files contained within an archive are commonly compressed, and the compressed files are decompressed prior to use. An archive may further include data required to extract the files from the archives “Archive” may also refer to act of transferring one or more files into an archives

“Compression” means the conversion of data into a form that requires less storage space. The term “compression” includes the use of any known compression algorithm. “Compression” also may commonly be referred to as “zipping” a file. The reverse process to compression is decompression, or expansion, of the compressed data back into a usable form. Compressed data is decompressed (expanded or unzipped) prior to use. Compression includes both lossy compression, in which data is lost in the process of compression, and lossless compression, in which no data is lost in the process of compression.

In an embodiment of the invention, computer files are processed in the form of a data stream. The computer files may include, but are not limited to, Java class files. In an embodiment, the computer files are converted to a serial data stream input for processing, and the processing of the computer files is conducted with the data remaining in the data stream form.

In an embodiment of the invention, a set of computer files are processed in a single pass as a serial data stream. In an embodiment, the serial data stream form is maintained both on input and output, thereby allowing further processing of class files without further file conversion. In an embodiment, the same data format is used for the data input and the data output. In an embodiment, the data stream conversion allows processing without any dependency on random access files, and broadens the applicable scope of the process for the input. In an embodiment, the processing of class files as a data stream allows processing without requiring use of, for example, Java library utilities that may normally be required to conduct the file processing.

In an embodiment of the invention, a system includes a serial data processing module for scanning received data. In an embodiment, the processing module receives computer files in the form of a serial data stream, and outputs an extracted data stream. In an embodiment, the processing module processes the files in a single pass, without requiring multiple readings of the file data.

In an embodiment of the invention, the conversion of computer files to a data stream allows for the use of a protocol for both the data producer (the computer file processor) and the data consumer without creating a complete file representation, thereby simplifying the data structure. In an implementation for Java class files, the processing system operates with a class file data model, without requiring the addition of any major abstraction for data processing.

In an embodiment, the conversion of computer files to a serial data format may include, but is not limited to, the operation of a traversal of a hierarchical data structure or of a data archive as provided respectively in U.S. patent application Ser. No. 11/648,065, entitled “Computer File System Traversal”, filed Dec. 30, 2006. Other processes for conversion of a set of files to a serial data stream may also be utilized in embodiments of the invention.

In an embodiment of the invention, for the processing of computer files it is assumed that processing occurs on an inner loop for critical processing stages. In an embodiment, a system provides high performance for inner loop class file processing.

In an embodiment of the invention, processing is designed to provide sufficient performance for overall computer file processing. For example, in an embodiment a system includes stream buffering to buffer data as it is obtained and processed. In addition, an embodiment of the invention provides a compact internal file state in the data stream, thereby minimizing the amount of data that will be required in the process of transferring and processing the computer files.

In an embodiment of the invention, a dedicated, independent processing module is provided. In an embodiment, the processing module may be utilized to identify type dependency data or annotation data in a serial data stream. A similar design and implementation may be utilized for either type of data,

In an embodiment of the invention, a file processor may be provided in multiple implementations, depending on the system requirements. In one example, native processing implementations may be provided for a computer file processor, with the native implementations being based upon relevant Java standards. In another example, a non-native implementation may be provided, as required. A particular non-native implementation may include a BCEL (Byte Code Engineering Library) implementation, with the BCEL API being a toolkit for the static analysis and dynamic creation or transformation of Java class files.

In an embodiment of the invention, a data consumer is a main framework expansion point for which neutral utility implementations might be required. In an embodiment of the invention, a file processor (the data producer) operates using the same data protocol as the data consumer protocol. In an embodiment of the invention, the data consumer may have control over the data to be provided to the data consumer. In an embodiment, the data producer and the data consumer may cooperate to agree on the data to be provided from the serial data stream. In an embodiment of the invention, a system may include complexity control, including configuring the file processor to deliver the data of interest. In an embodiment, the data of interest includes data meeting a certain degree of detail, or certain types of data. In an embodiment of the invention, the structure of the data processing may allow for a system to be utilized with loose semantics and implementation constraints. For example, the technical framework and protocol data types may be defined. However, there may be leeway for implementation characteristics, such as the result order sequence and analysis capabilities.

In an embodiment of the invention, file processing may be included within a set of tools that are provided to search files. The tools may, for example, provide for the conversion of files into serial form by a traversal process, the scanning of data for desired elements, and other related processes.

In a particular embodiment of the invention, a process is applied to Java class files contained within a hierarchical file structure or within a Java archive (or JAR file), including class files for J2EE systems (Java 2, Enterprise Edition). In an embodiment, the output of process for traversal of the hierarchical file structure or archive is a class file data stream. An embodiment may utilize Java under the JDK (Java Development Kit) 5.0, including the JSR-175 recommendation regarding code annotations. In an embodiment of the invention, the class files may contain elements such annotations or occurrences of particular class types, and there may be a need to extract these elements from the class files. In an embodiment of the invention, the class files are converted to a serial data stream format for the input to a processing module, and the processing module scans the serial data stream and extracts the desired elements.

FIG. 1 is an illustration of an embodiment of a file processing system to process a serial data stream input. In this illustration, a computer file hierarchy or archive 110 is provided to a traversal module 120. In a particular implementation of a file processing system, the serial data may be generated from a file traversal process, as illustrated, but embodiments of the invention are not limited to any particular type of process to generate the serial data stream. The file system may include certain elements that may be sought, including, but not limited to annotations for Java files.

In an embodiment of the invention, the traversal module 120 walks through the file structure or archive. Using only the names of the elements, the traversal module 120 may make a determination whether to process or skip each element of the archives In an embodiment of the invention, the archive traversal module processes only portions of interest, and does not process any element more than once.

In an embodiment of the invention, the traversal module 120 then outputs a serial data stream 130 representing the elements of interest in the file structure or archive 110. In an embodiment, the data stream 130 may be used for any purpose, including the provision of the data to a data stream processing module 140. In an embodiment of the invention, the processing module 140 may be intended to process the archive in a serial form to, for example, search for certain elements in the portions of interest in the archives The processing module 140 may then produce a data output 150 that, for example, includes information regarding elements that were found in the archives

FIGS. 2 a and 2 b illustrate respectively a file hierarchy and an archive that may be converted to a serial data stream, such as by the operation of the traversal module 120 illustrated in FIG. 1. FIG. 2 a is an illustration of a file hierarchy including an archive that may be processed in an embodiment of the invention. In this illustration, a hierarchical file system 200 may include a root node 202, with the root node 202 having one or more branches below in the hierarchy. For example, a branch may include a file 204, which would be a “leaf node” as it terminates the branch. In another example, the branch 206 may include one or more nodes below, which in this case are shown as file 208 and file 210. These elements are leaf nodes in this example, but other branches may exist below in the hierarchy.

In addition, a branch may be an archive 212, the archive containing one or more files (unless the archive is empty). The file hierarchy 200 may include any number of archives, with the archives existing at any point in the hierarchy. In an embodiment of the invention, the file hierarchy may be subject to processing for a file system. In an embodiment, the operation may be transferred to a separate processing to address the computer archive when it is encountered. In an embodiment, after completion of processing of the archive the operation may return to the original processing.

FIG. 2 b is an illustration of an archive that may be processed in an embodiment of the invention. The archive 220 may, in one example, be an archive encountered in the processing of a file system, such as archive 212 in the processing of the hierarchical file system 200. A non-empty archive will contain one or more files or archives (each such archive being an archive within an archive, or an inner archive). In this illustration, archive 220 contains file 222, file 232, and file 242, but also contains inner archive 224 and inner archive 234. Archive 224 contains one or more files or archives, which are shown here as file 226, file 228, and file 230. Archive 234 contains one or more files or archives, which are shown here as file 236, file 238, and file 240.

In an embodiment of the invention, the contents of file hierarchy 200 or archive 220 are traversed, with the outcome of the traversal being a data stream representing selected portions of such contents. In an embodiment of the invention, the traversal addresses each element of the file hierarchy or archive no more than once. In an embodiment of the invention, the selection of elements to process is based upon the names of the elements, thus preventing the need to enter archived elements, such as to decompress such elements, if the elements will not be processed.

FIG. 3 is an illustration of an embodiment of processing of computer file data. In this illustration, conversion of computer file data is provided 305. The computer file data may be, but is not limited to, Java class file data. The conversion of the computer file data may include, but is not limited to, the traversal of a hierarchical file or archive, including the operation of traversal module 120 illustrated in FIG. 1. The output of the processing of computer file data is a serial data stream 310 representing the computer file data.

In an embodiment of the invention, the serial data stream 310 then is provided to a file processor or scanner 315, which processes the data, including scanning the data stream for data elements of interest. The file processor 315 may contain multiple modules or sub-modules, depending on the particular embodiment. The file processor 315 outputs an extracted data stream 320, which represents elements of the data stream that have been selected by the file processor 315. The extracted data stream 320 then is eventually provided to a consumer 325, which may be entity or agent that requires the result of the scanning operation. The consumer 325 may receive additional reports or data processing as required for the needs of the consumer 325.

FIG. 4 is an illustration of a computer file processing system 400. While this illustration shows the processes occurring within a single system for simplicity in description, the processes may occur in multiple systems, including multiple systems within a network. In this illustration, a computer file data stream 405 is provided to a file processor 410, which scans the data for desired data elements. The data stream 405 may, for example, represent Java class file data that has been converted into a serial data stream. The file processor 410 may include multiple components, depending on the particular embodiment of the invention. The file processor 410 outputs extracted computer file data 415, which is presented to a consumer 420.

In an embodiment of the invention, the operation of the computer file processing system 400 is directed by certain inputs and settings. The operation of the file processor 410 may be directed by a scanner configuration 425. In addition, a data mode configuration 430 affects both the file processor 410 and the consumer 420. The file processor 410 also may include one of multiple implementations. In particular embodiments, the implementation may be a native implementation 435 or a BCEL (Byte Code Engineering Library) implementation. The BCEL implementation may include the Apache BCEL process 445, as developed by the Apache Software Foundation. In addition, the consumer 420 may utilize a framework utility 450 and a framework extension 455 in the operation of the computer file processing.

FIG. 5 is a flowchart to illustrate an embodiment of computer file stream processing. In this illustration, a serial class file data stream is received 505. The class file data stream may include, but is not limited to, a data stream resulting from the traversal of a hierarchical file system or an archives In this process, a first data element is received 5 10. The data element is scanned and, based upon, for example, a data mode configuration, there is a determination whether the data element is needed 520. If the data element is not needed, then there is determination whether any more data elements are remaining in the received data stream 530. If the data element is needed, the data element is output in an extracted data stream 525, followed by the determination whether any more data elements are remaining in the received data stream 530. If more data elements remain in the received data stream 530, then the next data element is received 515. If no more data elements remain in the received data stream, then the process ends 535. The processing of data elements may include other processes not illustrated here, depending on the embodiment of the invention.

FIG. 6 is an embodiment of library utilities. FIG. 6 may illustrate software modules, hardware modules, or modules including a combination of software and hardware. In this illustration, the utilities relate to an interface layer comprising code walk interfaces (code.walk 680); for class file processing and file walk interfaces (file.walk 610) for locating files; and further to an implementation toolbox comprising code processing 650 and a code walk implementation (code.walk.impl 660) for class file processing, and file processing 655 and a file walk implementation (file.walk.impl 630) for locating files.

In the interface layer, the code walk interfaces 680 may include a class file annotation value interface module 682, a class file program element interface module 684, a class file annotation handler interface module 686, a class file annotation scanner interface module 688, a class file dependency scanner interface module 690, and a class file dependency listener interface module 692. The file walk interfaces then may include a file condition interface module 612, a file name classifier interface module 614, a directory walker handler interface module 616, a directory walker interface module 618, a zip walker handler interface module (“zip” indicating use for archives) 620, a zip walker interface module 622, and a file notification interface module 624.

In an embodiment of the invention, the code processing 650 may provide for parsing types from class file descriptors. Code processing 650 may include a class file format helper module 652 and a class file descriptor parser module. The code walk implementation 660 for class file processing may include a class file annotation record module 662, a class file element record module 664, a class file annotation filter 666, a class file annotation for native elements 668, a class file dependencies module for native elements 670, a class file dependencies module for BCEL (Byte Code Engineering Library) elements 672, a class file dependency concentrator module 674, and a class file dependency filter 676.

In an embodiment of the invention, the file processing 655 may include a comma separated value (CSV) formatter and a CSV scanner. The file walk implementation 630 for locating files may include a simple file condition module 632, a basic file name classifier module 634, a directory finder module 636, a directory walker implementation module 638, a walk recorder module 640, a zip (archive) condenser module 642, and a zip walker implementation module 644.

FIG. 7 is an illustration of a computer system in an embodiment of the invention. The computer system may be utilized as a system for processing of computer files in the form of a data stream, or may represent one of multiple systems used in such processing. The computing system illustrated in FIG. 7 is only one of various possible computing system architectures, and is a simplified illustration that does include many well-known elements. As illustrated, a computing system 700 can execute program code stored by an article of manufacture. Computer system 700 may be a J2EE system, ABAP system, or administration system. A computer system 700 includes one or more processors 705 and memory 710 coupled to a bus system 720. The bus system 720 is an abstraction that represents any one or more separate physical buses, point-to-point connections, or both connected by appropriate bridges, adapters, or controllers. The bus system 720 may include, for example, a system bus, a Peripheral Component Interconnect (PCI) bus, a HyperTransport or industry standard architecture (ISA) bus, a small computer system interface (SCSI) bus, a universal serial bus (USB), or an Institute of Electrical and Electronics Engineers (IEEE) standard 1394 bus, sometimes referred to as “Firewire”. (“Standard for a High Performance Serial Bus” 1394-1995, IEEE, published Aug. 30, 1996, and supplements thereto)

As illustrated in FIG. 7, the processors 705 are central processing units (CPUs) of the computer system 700 and control the overall operation of the computer system 700. The processors 705 execute software stored in memory 710. A processor 705 may be, or may include, one or more programmable general-purpose or special-purpose microprocessors, digital signal processors (DSPs), programmable controllers, application specific integrated circuits (ASICs), programmable logic devices (PLDs), or the like, or a combination of such devices.

Memory 710 is or includes the main memory of the computer system 700. Memory 710 represents any form of random access memory (RAM), read-only memory (ROM), flash memory, or the like, or a combination of such devices. Memory 710 stores, among other things, the operating system 715 of the computer system 700.

Also connected to the processors 705 through the bus system 720 are one or more mass storage devices 725 and a network adapter 735. Mass storage devices 725 may be or may include any conventional medium for storing large volumes of instructions and data 730 in a non-volatile manner, such as one or more magnetic or optical based disks. In an embodiment of the invention, the mass storage devices may include storage of file or an archive 732 that requires processing. In an embodiment of the invention, the processors 705 may operate to traverse the files or archive 732, the traversal of the files or archive 732 resulting in output of a serial data stream representing selected elements of the archives The processor 705 may scan the serial stream for desired data elements within the computer files. In another embodiment the computer system 700 may provide for the conversion of the computer files into a serial data stream, while another system or systems is responsible for scanning the data stream for desired data elements.

The network adapter 735 provides the computer system 700 with the ability to communicate with remote devices, over a network 740 and may be, for example, an Ethernet adapter. In one embodiment, the network adapter may be utilized to output data including, for example, an extracted serial data stream representing selected elements of the files or archive 732.

FIG. 8 illustrates an embodiment of a client-server network system. As illustrated, a network 825 links a server 830 with client systems 805, 810, and 815. Client 815 may include certain data storage 820, including computer files in the form of, for example, a computer file hierarchy or computer archive 822. Server 830 includes programming data processing system suitable for implementing apparatus, programs, and/or methods in accordance with one or more embodiments of the present invention. Server 830 includes processor 835 and memory 840. Server 830 provides a core operating environment for one or more runtime systems, including, for example, virtual machine 845, at memory 840 to process user requests. Memory 840 may include a shared memory area that is accessible by multiple operating system processes executing in server 830. For example, virtual machine 845 may include an enterprise server (e.g., a J2EE-compatible server or node, Web Application Server developed by SAP AG, WebSphere Application Server developed by IBM Corp. of Armonk, N.Y., and the like). Memory 840 can be used to store an operating system, a Transmission Control Protocol/Internet Protocol (TCP/IP) stack for communicating over network 825, and machine executable instructions executed by processor 835. The memory 845 may also include data 850 for processing, including the processing of data that includes data of one or more computer file hierarchies or computer archives 852. In an embodiment, the data has been converted into a serial data stream for processing. In some embodiments, server 835 may include multiple processors, each of which can be used to execute machine executable instructions.

Client systems 805-815 may execute multiple application or application interfaces. Each instance or application or application interface may constitute a user session. Each user session may generate one or more requests to be processed by server 830. The requests may include instructions or code to be executed on a runtime system, such as virtual machine 845 on server 830.

In the description above, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form.

The present invention may include various processes. The processes of the present invention may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor or logic circuits programmed with the instructions to perform the processes. Alternatively, the processes may be performed by a combination of hardware and software.

Portions of the present invention may be provided as a computer program product, which may include a computer-readable medium having stored thereon computer program instructions, which may be used to program a computer (or other electronic devices) to perform a process according to the present invention. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disk read-only memory), and magneto-optical disks, ROMs (read-only memory), RAMs (random access memory), EPROMs (erasable programmable read-only memory), EEPROMs (electrically-erasable programmable read-only memory), magnet or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions. Moreover, the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer to a requesting computer.

Many of the methods are described in their most basic form, but processes can be added to or deleted from any of the methods and information can be added or subtracted from any of the described messages without departing from the basic scope of the present invention. It will be apparent to those skilled in the art that many further modifications and adaptations can be made. The particular embodiments are not provided to limit the invention but to illustrate it. The scope of the present invention is not to be determined by the specific examples provided above but only by the claims below.

It should also be appreciated that reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature may be included in the practice of the invention. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims are hereby expressly incorporated into this description, with each claim standing on its own as a separate embodiment of this invention. 

1. A method for processing of computer files comprising: receiving a serial data stream input, the serial data stream input representing a set of computer files; scanning the serial data stream input to extract selected data elements occurring in the set of computer files; and outputting the selected data elements in a serial data stream output.
 2. The method of claim 1, wherein the computer files are Java class files.
 3. The method of claim 1, further comprising converting the set of computer files to generate the serial data stream input.
 4. The method of claim 1, wherein scanning the serial data stream input comprises scanning each element of the serial data stream input no more than once.
 5. The method of claim 1, wherein a selected data element in the computer files is an annotation.
 6. The method of claim 1, wherein a selected data element is an occurrence of a particular class of computer file.
 7. The method of claim 1, wherein a format of the serial data stream input is the same as a format of the serial data stream output.
 8. The method of claim 1, further comprising providing the serial data stream output to a data consumer.
 9. The method of claim 1, further comprising processing the serial data stream output to extract selected data elements from the contents of the serial data stream output.
 10. A system comprising: a data input, the data input to receive a serial data stream input, the serial data stream input representing a set of computer files; a processing module, the processing module to scan the serial data stream input to identify one or more elements in the set of computer files; and a data output, the data output providing an extracted serial data stream output representing the identified elements of input data stream.
 11. The system of claim 10, further comprising a conversion module to convert the set of computer files to the serial data stream input.
 12. The system of claim 11, wherein the set of computer files comprises a set of Java class files.
 13. The system of claim 11, wherein the serial data stream input and the serial data stream output have the same data format.
 14. The system of claim 11, further comprising a second processing module, the second processing module receiving the serial data stream output as an input for additional processing.
 15. The system of claim 11, wherein an identified element in the computer files is an annotation.
 16. The method of claim 1, wherein an identified element in the computer files is an occurrence of a particular class of computer file.
 17. An article of manufacture comprising: a computer-readable medium including instructions that, when accessed by a processor, cause the computer to perform operations comprising: converting a set of computer files to generate a serial data stream input; scanning the serial data stream input to extract selected data elements occurring in the set of computer files; and outputting the selected data elements in a serial data stream output.
 18. The article of manufacture of claim 17, wherein each computer file is a Java class file.
 19. The article of manufacture of claim 17, wherein scanning the serial data stream input comprises scanning each element of the serial data stream input no more than once.
 20. The article of manufacture of claim 17, wherein a selected data element in the computer files is an annotation.
 21. The article of manufacture of claim 17, wherein a selected data element in the computer files is an occurrence of a particular class of computer file.
 22. The article of manufacture of claim 17, wherein a format of the serial data stream input is the same as a format of the serial data stream output.
 23. The article of manufacture of claim 17, wherein the medium further includes instructions that, when accessed by a processor, cause the computer to perform operations comprising: providing the serial data stream output to a data consumer.
 24. The article of manufacture of claim 17, wherein the medium further includes instructions that, when accessed by a processor, cause the computer to perform operations comprising: processing the serial data stream output to extract selected data elements from the contents of the serial data stream output. 